Bandwidth compression utilizing magnitude and phase coded signals representative of the input signal



Dec. 26, 1967 J. FLANAGAN BANDWIDTH COMRESSION UTILIZING MAGNITUDE AND PHASE CODED SIGNALS REPRESENTATIVE OF THE INPUT SIGNAL 5 Sheets-Sheet 1 Filed May 7, 1964 /N l/E N 7' OR HVJ, L FL ANAGA/v A T rom/EV Dec. 26, 1967 J. FLANAGAN BANDWIDTH COMPRESSION UTILIZING MAGNITUDE ANDA PHASE' CODED Filed May 7, 1964 SIGNALS REPRESENTATIVE OF THE INPUT SIGNAL 3 Sheets-Sheet 2 Q s .Q

.21N ENZS N UP* J. L. FLANAGAN Dec. 26, 1967 BANDWIDTH coMPREssIoN UTILIZING MAGNITUDE AND PHASE CoDED SIGNALS REPRESENTATIVE OF THE INPUT SIGNAL 3 Sheets-Sheet :5

Filed May '7, 1964 S s W z page .mmm w United States Patent Oliee 3,360,610 Patented Dec. 26, 1967 BANDWIDTH COMPRESSION UTILIZING MAG- NITUDE AND PHASE CGDED SIGNALS REP- RESENTATIVE F THE INPUT SIGNAL .lames L. Flanagan, Warren Township, Somerset County,

NJ., assignor to Bell Telephone Laboratories, Incorporated, New York, N.Y., a corporation of New York Filed May 7, 1964, Ser. No. 365,587

5 Claims. (Cl. 179-1555) ABSTRACT 0F THE DISCLOSURE For each of a preselected plurality of frequencies, a pair of signals is generated which represents the real and imaginary components of the short-time Fourier transform of an incoming signal. From each pair of signals a set of control signals is derived that represents the magnitude and the time derivative of the phase angle of the short time Fourier transform. The control signals are transmitted to a receiver where a replica of the original signal is'produced by modulating Va plurality of cosine generators.

This invention relates to the transmission of human i.

speech in coded form, and in particular to systems for transmitting human speech in coded form in order to conserve transmission channel bandwidth.

Conventional speech communication systems, for example, commercial telephone systems, typically convey human speech by transmitting an electrical facsimile of the acoustic wave form produced by a human talker. Because of the redundancy of human speech,'however, facsimile transmission is a relatively inefficient way to transmit speech information, and it is well known that the information contained in a typical speech sound may be transmitted over a channel of substantially narrower bandwidth than that required for facsimile transmission of the speech wave form.

A number of arrangements for compressing or reducing the amount of bandwidth employed in the transmission of speech information have been proposed, one of the best known of these arrangements being the so-called channel vocoder, a description of which may be found in an article by E. E. David, Jr., entitled, Signal Theory in Speech Transmission, vol. CT-3, IRE Transactions on Circuit Theory, page 232 (1956).

As pointed out in the above-mentioned David article, the channel vocoder represents speech by a number of points on its short-time power spectrum and by a voicedunvoiced pitch signal representative of the characteristics of the excitation source applied to the talkers vocal tract. It has 'been recognized, however, that efficient, accurate representation of the excitation source characteristics has been dicult to achieve and that errors in determining these characteristics impair the intelligibility and quality of speech synthesized in channel vocoders. An example of an arrangement that avoids the diiiiculties inherent in determining excitation source characteristics is described in R. L. Miller Patent 2,953,644, issued Sept. 20, 1960.

The present invention also avoids the diiculties inherent in determining excitation source characteristics by providing a speech communication arrangement in which speech is encoded in terms of a number of points on the short-time speech amplitude spectrum as well as an equal number of points on the time derivative of the short-time speech phase spectrum. At each of a plurality of predetermined frequencies which span the frequency range of an incoming speech signal there is obtained a pair of signals respectively representative of the real and imaginary parts of the short-time Fourier transform of the original speech signal. From each pair of signals representing the real and imaginary parts of the short-time Fourier transform at a predetermined frequency the present invention derives a pair cf narrow band signals, one representing the magnitude of the short-time Fourier transform, or the value of the short-time amplitude spectrum, at that frequency, and the other representing the time derivative of the phase angle of the short-time Fourier transform, or the value of the time derivative of the short-time phase spectrum, at that frequency. v

The pairs of narrow band signals may be transmitted from a transmitter terminal to a receiver terminal over a channel having a substantially smallerl bandwidth than that required for facsimile transmission of the original speech signal. At the receiver terminal, there are generated a plurality of cosine waves having the same predetermined frequencies at which the short-time Fourier transform was evaluated, and each cosine wave is modulated in amplitude and phase angle by one of the pairs of narrow band signals. The modulated cosine waves are then combined to form a replica of the original speech signal.

The invention will be fully understood from the following detailed description of an illustrative embodiment thereof, taken in connection with the appended drawings, in which:

p FIG. 1 is a block diagram showing a complete vocoder system embodying the principles of thisv invention;

, FIG. v2 is a block diagra-mfshowing alternative apparatus embodying the principles of this invention; and

FIG. 3 is a diagram of assistance in explaining the features of this invention. f

Theoretical considerations It is well known from various listening tests that a filtering of a speech signal by parallel, contiguous bandpass filters does not impair intelligibility and quality to a significant amount; that is, the original speech signal f(t) is approximated by where the fn(t) are the respective outputs of N bandpass filters, as shown in FIG. 3. The pass bands of such filters might be chosen, for example, as they are in a channel v'ocoder, that is, 15 channels with bandwidths of about 200 c.p.s. each with center frequencies of 200, 400, 3000 c.p.s. for `n=1,2 15.

The total channel bandwidth required to transmit the NffnO) signals, if they are not further processed, is identical to that required for the original signal f(t). However, by the present method it is possible to describe each f(t) signal in terms of its short-time amplitude spectrum and in terms of its short-time phase spectrum. These two functions suitably low-pass filtered, say to 30 c.p.s. each, can then be transmited over a channel greatly restricted in bandwidth (for instance, a 60 c.p.s. bandwidth is used to transmit the information about each 200 c.p.s. segment of the original in the N 15 example).

The required short-time amplitude and phase spectra are defined by the following relations. If the nth bandpass lter has an impulse response [h(t) cos wnt], then the fn(t) signal is the convolution f. n=f t formen @0s ...anni (2) whe-re the function h(t) is the envelope of the bandpass lilter impulse response and is also the impulse response of a realizable low-pass lilter. In similar manner, a signal complementary to fn(z) is defined by the relation n n=ff fone-risa ...a-ndr (3) The complex combination [fn(t) +jfn(t)] can therefore be used to define a complex time function given by The right-hand integral of Equation 4b defines a shorttime Fourier transform, F(wnt), of the original signal f(t), that is,

FrwnJj--f finan-mysnip (5) Since F(w,t) is in general complex, it may be expressed in the polar form as F (wmt) lFtwmt) leitm' (6a) or, dropping the explicit (wnt) notation,

F(wmi)=lFnle"n (6b) where lFnl is the magnitude of F (wnt) and :pn is the phase angle of F (wmt).

The complex time function in Equation 4b can therefore be Written where both |Fn| and gan are short-time amplitude and phase spectra, respectively, evaluated at frequency wn. The [Fnl and en quantities both change with time, and the original signal is now represented in terms of N values each of the short-time amplitude and phase spectra, that is,

N N f fn(t):2 iFni COS (wnt'i'Sn) The factor IF,`,| therefore constitutes an amplitude modulation of the nth term of the series, and the factor pn constitutes a phase modulation nth term.

As shown by H. S. Black in Modulation Theory, Chapter 3, pages 27-30, an instantaneous frequency can be defined for the cosine factors of Equation 9 by the time derivative of the angle arguments, that is, by

where 9(t) is the argument and the dot signifies the time derivative. Through this definition, H. S. Black shows that phase and frequency modulation are vangle modulations which are not essentially different. A frequency modulation of a carrier wn by the phase derivative pn is therefore equivalent to Equation 9, that is n()=IFnI cos (wut-Hin) (ma) t fn u=lu| cos rwa-Kf @anni (11b) where the K and 1c of the right-hand side of Equation 1lb are contsants and are similar to the symbols of Equation (3 3) on page 28 of H. S. Black, Modulation Theory.

The factors [Ful and gian therefore completely specify the nth component signal fn(t). These factors vary relatively slowly With time and may be low-pass filtered before transmission. Each pair of [Ful and gon signals may then be used at the receiver in a combined amplitude and frequency modulation to recover a close approximation to the original f(t).

In order to measure each [Ful and pn We first notice from Equations 5 and 6b that {Fnleitn may lbe expressed in terms o-f cosine and sine transforms,

if; fOoku-x) sin anun] Where (awnt) and b(wn,t) are respectively the real and imaginary parts of F (wnt). The [Fmt and 96 parameters are therefore given by Measures of a(w,t) and b(w,t) can be implemented by noticing that a(w,t) is the convolution of the product function [f(t) cos wnt] with the low-pass Window h(t), and b(w,t) is the convolution of the product function [f(t) sin ont] with the same low-pass function h(t).

A straightforward time differentiation of the pn yields the pn function. Alternatively, the time derivative of the phase angle may be approximated in a manner that avoids the necessity for evaluating the inverse tangent function required in the definition given by Equation (13b). Since the time derivative of tan-1x is defined to be Referring now to FIG. l, an incoming speech signal Kt) from source 1t), Where source 16 may be a conventional microphone of any desired variety, is applied in parallel to N analyzers l-ll through l-N. Within analyzer 1 1 the incoming speech signal is delivered in parallel to multipliers 10M-1 and 1Mb-1. In accordance with the requirements of Equation (12a), multiplier lilla-1 is also supplied with a fixed frequency cosine Wave cos wlt from cosine function generator lima-1, and multiplier 1Mb-1 -is also supplied with a xed frequency sine Wave sin w11? from sine function generator 102b-1. Similarly, in the other analyzers through l-N the incoming speech signal is multiplied by sine and cosine waves, as shown for example in analyzer 1-N by the multiplication of f(t) with cos wNt and sin wNz. The frequencies of the sine and cosine Waves by which the speech signal is multiplied in each analyzer are chosen in a manner similar to the frequency spacing of the filters in a channel vocoder. For example, for N equal to 15, and for a speech signal bandwidth extending from 100 to 3100 cycles per second, the frequencies w1 through wN may -be located on the frequency scale at intervals of 200 cycles per second, beginning with w1 fixed at 200 cycles per second and ending with oN fixed at 3,000 cycles per second.

The product signals developed lby multipliers 101a-1 and 101b-1 in analyzer 1 1 are passed through low-pass filters 103a-1 and 103b-1, those filters having an identical predetermined impulse response h(t) so that the output signals of filters 103e and 103b correspond .to the convolution terms specified by Equation 12a. A signal [FII representative of the magnitude of the short-time amplitude spectrum at frequency w1, is then ob-tained by squaring the convolution output signals of filters 103a-1 and 103b-1 in suitable squaring circuits 104a-1 and 104b-1, combining the squared signals in adder S-1 and obtaining the square root of the sum signal in square root taking circuit 106-1. Squaring circuits 104a-1 and 104b-1 and square root taking circuit 106-1 may be of the type shown in W. J. Karplus and W. W. Soroka, Analog Methods, pages 78-81, (2d Ed. 1959). Simultaneously, a

signal gbl representative of the time derivative of the shorttime phase spectrum at frequency w1 is obtained by passing the output signals of filters 103:1-1 and 103b-1 through inverse tangent function generator 107-1 followed yby difierentiator 108-1- The output signals of circuits 106-1 and 10S-1 are smoothed by low-pass filters 1099-1 and 10911-1 to obtain a pair of narrow band control signals respectively representative of the quantities |F1`| and gbl. Similarly, each of the other analyzers throughA l-N produces a pair of narrow band control signals representative of a selected point on the amplitude spectum and a selected point on the time derivative of the phase spectrum; thus as shown in FIG. l, analyzer 1-'N produces a pair of control signals representative of [FN] and gbN evaluated at the frequency oN. The N pairs of control signals developed at the transmitter terminal by analyzers 1 1 through 1-N thus represent in coded form the information content of the incoming speech signal. The collective bandwidth of these N pairs of control signals is substantially smaller than that of the incoming speech signal f( t); hence the N pairs of control signals may be transmitted via a transmission medium having a substantially narrower bandwidth than that required for facsimile transmission of the incoming speech signal.

It is observed that the transmitter terminal of this invention does not include any pitch detection apparatus or other equipment usually required for distinguishing between voiced and invoiced portions of the incoming speech signal or for measuring the periodicity of voiced portions of speech signal, thereby eliminating a common source of errors. The absence of such equipment is due to the representation of speech in terms of both its amplitude spectrum and its phase spectrum, which, as shown in Equations 9, lla and 11b, completely specifies the speech signal without requiring supplementary information regarding other speech characteristics.

Turning now to FIG. 2, this drawing illustrates alternative apparatus for obtaining the time derivative of points on the phase spectrum in accordance with Equation 14. Each of the output signals from filters 103a-1 and 10317-1, denoted a and b, respectively, is passed through a conventional differentiator 301a-1 and 301b-1 and each of the resulting differentiated signals, denoted z and li, respectively, is multiplied in multipliers 30211-1 and 30217-1 by the undifferentiated signals b and a from filters 103b-1 and 103a-1, respectively.` The product signals thereby developed by multipliers 302a-1 and 302b-1 are therefore representative of the terms (bt-z) and (ab.) in the numeratorv 0n the right-hand side of Equation 14. These two product signals are then combined in subtracting circuit 303-1 to obtain a difference signal proportional to (biz-abi). This 6 difference signal is then divided in 4divider 304-1 by the output signal of adder 105, so that the quotient signal appearing at the output terminal of divider 304-1 represents the first derivative of the short-time phase spectrum as specified by Equations 10a and 10b.

Returning now to FIG. l, the N pairs of control signals are transmitted over a suitable medium to a receiver terminal, where each pair of control signals is applied to a corresponding synthesizer 1 1 through l-N. Within synthesizer 1 1, for example, the phase derivative control signal pl is applied to a frequency-modulated oscillator 21-1, where oscillator 2.1-1 may be of any conventional design for producing a cosine wave at the fixed frequency w1 which is modulated by the incoming phase derivative control signal to produce an output cosine wave having an argument (w1t-{- f lgalat) in accordance with Equations lla and 11b. A suitable oscillator is described in F. Terman, Radio Engineering, pages 493-499 (3d ed. 1947). This frequency-modulated cosine wave is applied to a multiplier 22-1 where it is multiplied together with the incoming magnitude control signal, IF 1f, thereby producing a product output signal proportional to the first term f1(t) of Equation 9. Correspondingly, the other synthesizers through 1-N develop at their output terminals product signals proportional to the other terms of Equation 9. These product signals 'are then additively combined by connecting the output terminals of synthesizers 1 1 through 1-N in parallel to the input terminal of adder 23 to form at the output terminal of adder 23 a sum signal that is a close replica of the original speech signal, as specified by Equations l and 9. A reproducer 24, which may be of any desired 'construction, converts the replica signal 'into audible sound.

Although this invention has been described in terms of a speech communication system of they type shown in FIG. 1, it is to be understood that applications of the principles of this invention are not limited to the field of speech cornmunciation, but include the fields of automatic speech recognition, speech processing and automatic message recording and reproduction. In addition, it is to be understood that the above-described arrangements are merely illustrative of the numerous arrangements which may be devised from the principles of this invention by those skilled in the art without departing from the spirit and scope of the invention.

What is claimed is:

1. A bandwidth compression system that comprises a transmitter terminal including a source of an incoming speech signal, and

a plurality of analyzers with input terminals connected in parallel to said source for deriving from said speech signal a corresponding plurality of pairs of magnitude and phase angle control signals by evaluating the short-time amplitude spectrum and short-time phase spectrum of said speech signal at a corresponding plurality of preset frequencies that span the frequency band of said speech signal,

means for transmitting each pair of magnitude and phase angle control signals to a receiver station, and at said receiver station,

a plurality of synthesizers having output terminals connected to an adding means, each synthesizer being supplied with one of said pairs of magnitude and phase angle control signals, wherein each of said synthesizers generates at its output terminal a cosine wave havingthe same preset frequency as the corresponding one of said analyzers and having magnitude and phase angle respectively specified by said pair of magnitude and phase angle control signals, an

wherein said adding means combines said cosine wave generated by each of said synthesizers to form a replica of said incoming speech signal.

2. Apparatus for encoding a speech wave in terms of a selected plurality of N values of the short-time amplit-ude spectrum of. said speech Wave and a corresponding plurality of N selected values of the time derivative of the short-time phase spectrum of said speech wave which comprises a source of an incoming speech wave, `and a plurality of N analyzers having input terminals connected in parallel to said source for deriving N magnitude control signals .and N phase control signals, wherein said nth analyzer, n=1, 2, N, cornprises means for Vgenerating a cosine Wave at a preset frequency wn,

means for generating a sine Wave at said preset frequency wn,

a first multiplier supplied with said cosine wave and said speech wave for developing a first product signal proportional to the product of said cosine Wave and said speech wave,

a second multiplier in parallel with said first multiplier and supplied with said sine Wave and said speech wave for developing a second product signal proportional to the product of said sine wave and said speech wave,

a first low-pass filter having a preset impulse response for developing from said first product signal a first convolution signal representative of the convolution of said first product signal with said impulse response,

a second low-pass filter having the same preset impulse response as said first filter for developing from said second product signal a second convolution signal representative of t-he convolution of said second product signal with said impulse response,

means having an input terminal connected to said first and second low-pass filters for deriving a magnitude control signal representative of said amplitude spectrum of said speech wave at said frequency wn which comprises a first squaring circuit supplied with said first convolution sign-al for generating a first output signal representative of said first convolution signal raised to the second power,

a second squaring circuit supplied with said second convolution signal for deriving a second output signal representative of said second convolution signal raised to the second power,

adding means for combining said first and second output signals to produce a sum signal,

square root taking means for obtaining a square root signal proportional to the square root of said sum signal,

and means for smoothing said square root signal to produce said magnitude control signal,

and means for deriving a phase .control signal representative of the time derivative of said phase spectrum of said speech Wave at said frequency wm,

whereby said N magnitude control signals and said IN phase control signals derived by said N analyzers respectively represent said selected plurality of N values of said short-time amplitude spectrum and N values of said short-time phase spectrum of said speech Wave at N preset frequencies w1, wz, wN.

3. Apparatus as defined in claim 2 wherein said means for deriving a phase control signal comprises means supplied with said first convolution signal and said second convolution signal for generating an inverse tangent signal representative of the inverse tangent of the negative value of said second convolution signal divided by said first convolution signal,

means for differentiating said inverse tangent signal with respect to time to obtain a derivative signal, and

means for smoothing said derivative signal to obtain said phase control signal.

4. Apparatus as defined in claim 2 wherein said means for deriving a phase control signal comprises lfirst differentiating means for differentiating said-first convolution signal With respect to time thereby to obtain a first differential signal, second differentiating means for diderentiating said second convolution signal with respect to time thereby to obtain a second differentiated signal,

a third multiplier following said first differentiating means and supplied with said second convolution signal for generating ra third product signal proportional to the product of said second convolution signal and said first differentiated signal,

ya fourth multiplier following said second differentiating means and supplied with said first convolution signal for generating a fourth product signal proportional to the product of said first convolution signal and said second differentiated signal,

means for subtracting said fourth product signal from said third product signal to obtain a difference signal,

means for dividing said difference signal by said sum signal to obtain a quotient signal,

and means for smoothing said quotient signal to produce said phase control signal.

5. Apparatus for .constructing a replica of an original speech wave from a plurality of N amplitude control signals and N phase control signals respectively representative of N selected values of the short-time amplitude spectrum of said speech wave and N selected values of the time derivative of the short-time phase spectrum of said speech Wave which comprises a source of N amplitude control signals, denoted IFII,

[F2l, IFNI, respectively representative of N selected values of said short-time amplitude spectrum of said speech wave at predetermined frequencies w1 w21 wNa source of N phase control signals, denoted pl,

(p2, pN, respectively representative of N selected values of the time derivative of said short-time phase spectrum of said speech wave at said N predetermined frequencies wl, cu2, wN,

a plurality of N parallel synthesizers, each synthesizer being supplied with one of said amplitude control signals and one of said phase control signals, said nth synthesizer, n=l, 2, N, comprising a frequency modulated oscillator responsive to said nth phase control signal pn for generating an nth cosine wave having a predetermined frequency wn and a phase angle controlled by said nth phase control signal,

a multiplier provided with two input terminals and one output terminal, one of said input terminals being connected to said frequency modulated oscillator and the other of said input terminals receiving said nth magnitude control signal lFnI, wherein said multiplier develops at its -output terminal an nth product cosine signal having an amplitude determined by said nth magnitude control signal,

an adding means for combining the product cosine signal developed at the output terminal of the multiplier in each of said synthesizers to form a sum signal representative of the sum of N product cosine signals,

and reproducing means for converting said sum signal into audible sound.

References Cited UNITED STATES PATENTS 2,953,644 9/1960 Miller 179--15.55 

1. A BANDWIDTH COMPRESSION SYSTEM THAT COMPRISES A TRANSMITTER TERMINAL INCLUDING A SOURCE OF AN INCOMING SPEECH SIGNAL, AND A PLURALITY OF ANALYZERS WITH INPUT TERMINALS CONNECTED IN PARALLEL TO SAID SOURCE FOR DERIVING FROM SAID SPEECH SIGNAL A CORRESPONDING PLURALITY OF PAIRS OF MAGNITUDE AND PHASE ANGLE CONTROL SIGNALS BY EVALUATING THE SHORT-TIME AMPLITUDE SPECTRUM AND SHORT-TIME PHASE SPECTRUM OF SAID SPEECH SIGNAL AT A CORRESPONDING PLURALITY OF PRESET FREQUENCIES THAT SPAN THE FREQUENCY BAND OF SAID SPEECH SIGNAL AT A CORRESPONDMEANS FOR TRANSMITTING EACH PAIR OF MAGNITUDE AND PHASE ANGLE CONTROL SIGNALS TO A RECEIVER STATION, AND AT SAID RECEIVER STATION, A PLURALITY OF SYNTHESIZERS HAVING OUTPUT TERMINALS CONNECTED TO AN ADDING MEANS, EACH SYNTHESIZER BEING SUPPLIED WITH ONE OF SAID PAIRS OF MAGNITUDE AND PHASE ANGLE CONTROL SIGNALS, WHEREIN EACH OF SAID SYNTHESIZERS GENERATES AT ITS OUTPUT TERMINAL A COSINE WAVE HAVING THE SAME PRESET FREQUENCY AS THE CORRESPONDING ONE OF SAID ANALYZERS AND HAVING MAGNITUDE AND PHASE ANGLE RESPECTIVELY SPECIFIED BY SAID PAIR OF MAGNITUDE AND PHASE ANGLE CONTROL SIGNALS, AND WHEREIN SAID ADDING MEANS COMBINES SAID COSINE WAVE GENERATED BY EACH OF SAID SYNTHESIZERS TO FORM A REPLICA OF SAID INCOMING SPEECH SIGNAL. 